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+ A J'^''[/i"(t)]^ dt, where the data are tj,Yj, j = 1, . . . ,n. The min- 
imization is taken over an infinite-dimensional function space, the space 
of all functions with square integrable second derivatives. But the calcula- 
tions can be carried out in a finite-dimensional space. The reduction from 
minimizing over an infinite dimensional space to minimizing over a finite di- 
mensional space occurs for more general objective functions: the data may 
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take a different form. This paper reviews the Reproducing Kernel Hilbert 
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linear differential operators. In this case, one can sometimes easily calculate 
the minimizer explicitly, using Green's functions. 
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1. Introduction 

A Reproducing Kernel Hilbert Space (RKHS) provides a practical and elegant 
structure to solve optimization problems in function spaces. This article consid- 
ers the use of an RKHS to analyze the data Yi, . . . F„ G 3? and ti, . . . , i„ G W. 
The distribution of the F^'s depends on fi, a function oi t Q which is usually 
assumed to be smooth. The goal is to find fi in a specified function space H to 
minimize 

G(<i, . . . , Yi, . . . , Y„,Fl{^l), F^ifi)) + XP{^i) (1.1) 

where G and the Fj^s are known, P is a known penalty on /i, and A serves to 
balance the importance between G and P. Typically, Fj{^) = fi{tj) and P{fJ.) is 
based on derivatives of ^. Some results here will concern general P and tj S 3?^, 
p > 1, and some more extensive results will concern tj G [a,b] C 9? and P 
generated from a differential operator L: 

P{^) = / [{Lfi){t)]^ dt where (L^)(i) = fi^"''>{t) + Wj{t)fi^'\t) (1.2) 

with Wj real- valued and continuous. For this type of penalty, we restrict fi to lie 
in the space 

H"^ [a, 6] = {/ : [a; b] ^ ^ : ^i^^^ ; j = 0; • • • , m — 1 are absolutely continuous 

nb 

and / [m'^HO]^ dt < oo}. 

J a 

Note that, for all n € W[a,b], /^^[(L/x)(t)]2 dt is well defined: Lfj.{t) exists 
almost everywhere t and hfx is square integrable, since the Wj's are continuous 
and [a, b] is finite. 
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The most well-known case of (1.1) occurs in regression analysis, when we seek 
the regression function /.j G 'H^[a, b] to minimize 

Y^[Y,-^i{t,)f + x fy'{t)Ydt. (1.3) 

The minimizing /i is a cubic smoothing spline, a popular regression function 
estimate. The non-negative smoothing parameter A balances the minimizing /i's 
fit to the data (via minimizing ~ with its closeness to a straight 

line (achieved when /^[M"(i)]^ dt = Q). The value of A is typically chosen "by 
eye" - by examining the resulting estimates of /x, or by some automatic data- 
driven method such as cross-validation. See, for instance, Wahba [23], Eubank 
[7] or Green and Silverman [8]. 

To extend (1.3) to (1.1), we can consider a first term other than a sum of 
squares, functionals other than i^j(/x) = nitj) and a differential operator other 
than the second derivative operator. Examples of these variations are given in 
Section 2. Section 3 contains the reduction of (1.1) to a finite dimensional opti- 
mization problem. Section 4 relates the minimizer of (1.1) to a Bayes estimate. 
Sections 5 and 6 contain results and algorithms for minimizing (1.1) with P 
as in (1.2), with Section 5 containing the "warm-up" of the cubic smoothing 
spline result for minimizing (1.3) and Section 6 containing the general case. The 
Appendix contains pertinent results from the theory of solutions of differential 
equations. 

The material contained here is, for the most part, not original. The material 
is drawn from many sources: from statistical and machine learning literature, 
from the theory of differential equations, from numerical analysis, and from 
functional analysis. The purpose of this paper is to collect this diverse material 
in one article and to present it in an easily accessible form, to show the richness 
of statistical problems that involve minimizing (1.1) and to explain the theory 
and provide easy to follow algorithms for minimizing (1.1). A briefer review of 
RKHS's can be found in Wahba [24]. 

2. Examples 

2.1. Penalized likelihoods with Fj{f) = f{tj) 

Most statistical applications that lead to minimizing (1.1) have the first term in 
(1.1) equal to a negative log likelihood. In these cases, the /i that minimizes (1.1) 
is called a penalized likelihood estimate of fi. Indeed, (1.3) yields a penalized 
likelihood estimator: the sum of squares arises from a likelihood by assuming 
that Yi, . . . ,Yn are independent normally distributed with the mean of Yj equal 
to n{tj) and the variance equal to . Then — 2x the log likelihood is simply 

n\og{a^) + \Y.^Y,-^,{t,)f. 
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A penalized likelihood estimate of /i with penalty minimizes 



Thus, for a given cr^, the penalized likelihood estimate of n minimizes (1.1) 
with A — X*a^. If the y,'s are not independent but the vector (yi,...,y„)' 
has covariance matrix ct^S, then we would replace the sum of squares with 

E,,fcK -M(*.)] [yfc-/i(t,)]. 

Another likelihood, important in classification, is based on data Yj — 1 or 
— 1 with probabilities and 1 — p{tj), respectively. Thus 

1 + y 1 — y- 

the log likelihood = V logp(tj) + log[l - p{tj)]. 

To avoid placing inequality constraints on the function of interest, we reparame- 
terize by setting = \og[p{t) / {1 — p{t))\ or equivalently = exp(/^(i))/[H- 
exp(/i(t))]. This reparameterization yields 

, ,-, T, J v-i + ^i, expOi(t,)) 1-y,, 1 

the log hkehhood = > — log ) H — ^ log , , 

s 2 ^ l + exp(;u(tj)) 2 ^ l + exp(/i(tj)) 

(2.1) 



2. 2. Fj 's based on integrals 

While Fj {^) = ii{tj ) is common, Fj (fi) is sometimes chosen to involve an integral 
of ^, specifically, Fj{fi) = H{s,tj)^{s)ds, with H known. See Wahba [23]. 

Li [15] and Bacchetti et al. [5] used (1.1) to estimate /x(t), the HIV infection 
rate at time i, based on data, Yj, the number of new AIDS cases diagnosed in 
time period {tj-i,tj]. The expected value of Yj depends not only on ^l{tj), but 
also on ^{t) for values oft < tj. This dependence involves the distribution of the 
time of progress from HIV infection to AIDS diagnosis, which is estimated from 
cohort studies. Letting F{t\s) denote the probability that AIDS has developed 
by time t given HIV infection occurred at time s. 



\s) ds = F,{pi). 



Thus we could define the first term in (1.1) as a negative log likelihood assuming 
the Yj^s are independent Poisson counts with E(y,) = Fj{fi) — Fj^i{fi). Or we 
could take the computationally simpler approach by setting the first term in 
(1.1) equal to 



f^jy, - [f,(a.)-f,_i(a*)]| 
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Both Li [15] and Bacchctti et al. [5] use this simpler approach, with the former 
using penalty P(/i) = / (/i")^ while the latter used a discretized version of 

Sip"?- 

In a non-regression setting, Nychka et al. [16] estimated the distribution of 
the volumes of tumours in livers by using data from cross-sectional slices of 
the livers. The authors modelled tumours as spheres and so cross-sections were 
circles. They estimated ^, the probability density of the spheres' radii, using an 
integral to relate the radius of a sphere to the radius of a random slice of the 
sphere. Their estimation criterion was the minimization of an expression of the 
form (1.1) with Fj using that integral and with P{^) — /(/^")^- 

2.3. Support vector machines 

Support vector machines are a classification tool, with classification rules built 
from data Yi G { — 1, 1}, ti g W (see, for instance, Hastie et al. [9]). The goal is 
to find a function for classifying: classify Yi as 1 if and only if /i(ti) > 0. We 
see that Yi is misclassified by this rule if and only if YiiJ,{ti) is positive. Thus, 
it is common to find /j, to minimize '^^sigii[Yin(ti)] subject to some penalty for 
rough fi: that is, to find /i to minimize 

^sign[Y,^,it,)] + xPi^i). 

i 

This can be made more general by minimizing 

Y,H[Y,^^{t,)]+XP{^,) 

3 

for a known non-decreasing function H . The function 77 (x) ~ sign(a;) is not con- 
tinuous at 0, which can make minimization challenging. To avoid this problem, 
Wahba [22] proposed using "softer" H functions, such as H{x) = ln[l+exp(— a;)]. 
This function is not only continuous, but is difFercntiable and convex. Wahba 
[22] showed that this H corresponds to a negative log likelihood. Specifically, she 
showed that the log likelihood in (2.1) is ec^ual to — ^ log{l + cxp [—Yjfi{tj)]}. 

2.4- Using different differential operators in the penalty 

Ansley, Kohn, and Wong [3] and Heckman and Ramsay [10] demonstrated the 
usefulness of appropriate choices of L in the penalty P(/U,) = /(L/x)^. For in- 
stance, Heckman and Ramsay compared two estimates of a regression function 
for the incidence of melanoma in males. The data, described in Andrews and 
Herzberg [1], are from the Connecticut Tumour Registry, for the years 1936 to 
1972. The data show a roughly periodic trend superimposed on an increasing 
trend. A cubic smoothing spline, the minimizer of (1.3), tracks the data fairly 
well, but slightly dampens the periodic component. This dampening does not 
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occur with Hcckman and Ramsay's preferred estimate, the estimate that min- 
imizes a modified version of (1.3) but with the penalty /[/^"(i)]^ di- replaced 
by the penalty /[^^^H*) + wV'(i)]^ dt with w = 0.58. The differential oper- 
ator \j=D^ + w^D^ was chosen since it places no penalty on functions of the 
form /i(t) ~ Oil + Oi2t + a'i cos u)t + 0:4 sin wt: such functions are exactly the 
functions satisfying L/i = and form a popular parametric model for fitting 
melanoma data. The value of w was chosen by a nonlinear least squares fit to 
this parametric model. 

The use of appropriate differential operators in the penalty has been fur- 
ther developed in the field of Dynamic Analysis. See, for instance, Ramsay et 
al. [17]. These authors use differential operators equal to those used by subject 
area researchers, who typically work in the finite dimensional space defined by 
solutions of L/i = 0. 

3. Results for the general minimization problem 

This section contains some background on Reproducing Kernel Hilbert Spaces 
and shows how to use Reproducing Kernel Hilbert Space structure to reduce the 
minimization of (1.1) to minimization over a finite-dimensional function space 
(see Theorem 3.1). Whether or not the minimizer exists can be determined 
by studying the finite-dimensional version. While a complete review of Hilbert 
spaces is beyond the scope of this article, a few definitions may help the reader. 
Further background on Hilbert spaces can be found in any standard functional 
analysis textbook, such as Kolmogorov and Fomin [13] or Kreyszig [14]. For a 
condensed exposition of the necessary Hilbert space theory, see, for instance, 
Wahba [23], [24] or the appendix of Thompson and Tapia [21]. We will only 
consider Hilbert spaces over 5ft. 

Consider T-L, a collection of functions from T io 3?. Suppose that T-L is 

a vector space over 5ft with inner product < •, • >. The inner product induces a 
norm on "H, namely ]]/]] = [< /, / >]^/^. The existence of a norm allows us to 
define limits of sequences in T-L and continuity of functions with arguments in 
%. The vector space 'H is a Hilbert space if it is complete with respect to this 
norm, that is, if any Cauchy sequence in H converges to an element of H. 

A linear functional F is a function from a Hilbert space H to the reals satis- 
fying F{af + Pg) = aF{f) + PF{g) for all a, ^ € 5ft and all f,gG'H. The Riesz 
Representation Theorem states that a linear functional F is continuous on 'H 
if and only if there exists rj G H such that < i], f >= F{f) for all / G The 
function r] is called the representer of F. 

The Hilbert space H is a Reproducing Kernel Hilbert Space if and only if, 
for all t E T, the linear functional Ft{f) = f{t) is continuous, that is, if and 
only if, for all t G T, there exists Rt G H such that < Rt,f >= f{t) for all 
f GT-L. Noting that the collection of i?t's, t G T, defines a bivariate function R, 
namely R{s,t) = Rt{s), we see that H is a Reproducing Kernel Hilbert Space 
if and only if there exists a bivariate function R defined on T x T such that 
< R{-,t), f >= f(t) for all f G H and all t G T. The function R is called the 
reproducing kernel of H. 
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One can show that the reproducing kernel is symmetric in its arguments, as 
follows. To aid the proof, use the notation that Rt{s) = R{s,t) and Rs{t) = 
R{t,s). By the reproducing properties of Rt and Rg, < Rt,Rs >= Rs{t) and 
< Rs,Rt >= Rtis)- But the inner product is symmetric, that is < Rt,Rs >=< 
Rs,Rt >■ So Rsit)=Rtis). 

To give the form of the finite-dimensional minimizer of (1.1), we assume that 
the following conditions hold. 

(C.l) There are Hq and Hi, linear subspaces of H, with Hi the orthogonal 
complement of Ho- 

(C.2) Hq is of dimension m < oo, with basis ui, . . . , u„j. If m ~ 0, take Jio equal 

to the empty set and Hi ^ H. 
(C.3) There exists Rq € Ho and Ri e Hi such that Ri is a reproducing kernel 

for Hi, in the sense that < Ri{-,t),fi >— fi{t) for all fi G Hi, i = 0,1. 

Since Ho is finite dimensional, it is closed. The orthogonal complement of a 
subspace is always closed. Thus Condition (C.l) implies that any ^ £H can be 
written as /i = /iq + Mi for some ^o G Ho and € Hi and that < fio, yUi >= 0. 
This is often written as "H = Ho © Hi. Note that Conditions (C.l), (C.2) and 
(C.3) imply that R = Ro + Ri is a reproducing kernel for H. 

We require one more condition, relating the penalty P to the partition of H. 

(C.4) Write = fia + l-ti, with /ij e Hi. Then P(/.j) =< ^i,fii >. 

Theorem 3.1. Suppose that conditions (C.l) through (C.4) hold and that 
Fi, . . . , Fn are continuous linear Junctionals on H. Let rjji{t) = Fj{Ri(,t)), 
that is, Fj applied to the function Ri considered as a function of s, with t fixed. 
Then to minimize (1.1), it is necessary and sufficient to find 

m n 

1 1 

where the aj 's and (3j 's minimize 

G{ti, ...,tn,Yi,..., r„, Fl{^lo + /ill), . . . , Fn{^lo + Mh)) + A/37v/3. 

Here (3 = (/3i, . . . , /?„)' and the matrix K is symmetric and non-negative definite, 
with K[j,k] = Fji-nki). If Fj{f) = f(tj) and Fk{f) = f{tk), then rjuit) = 
Ri{tj,t), iqikit) = Ri{tk,t) and K[j,k] = Ri{tj,tk). 

Proof. By the Riesz Representation Theorem, there exists a representer rjj G H 
such that < r]j,ii >= Fj{fj,) for aU fi & H. Applying the Riesz Representation 
Theorem to the subspaces Ho and Hi, which can be considered as Hilbert spaces 
in their own rights, there exists r]jo S Ho and r]*^ e Hi, representers of Fj in 
the sense that < r]jo,n >= Fj{^) for all n G Ho and < rj*^,^ >= Fj{ii) for 
all M e Hi. One easily shows that this 77^^ is equal to rjji, as defined in the 
statement of the Theorem: by the definition of the representer of Fj , rjji must 

imsart-generlc ver. 2009/08/13 file: RKHS_arxiv.tex date: November 9, 2011 



N Heckman/B.KHS Made Easy 



7 



satisfy Fj{Ri{-,t)) ~< ?]*i,Ri{-,t) >. But, by the reproducing quality of 
< i]*i,Ri{-,t) >= So rj*j = rjji. One also easily shows that 

We use the ?7ji's to partition Hi as follows. Let Hn be the finite dimensional 
subspace of Jii spanned by r]ji,j = 1, • ■ ■ , n, and let '^12 be the orthogonal 
complement of 'Hn in Hi. Then H = Ho(B Tin © 7^12 and so any /x € "H can be 
written as 

yu = /io + /ill + Aii2 with /.Jo e "Ho and yUu- € "Hifc, A; = 1, 2. 

We now show that any minimizer of (1.1) must have /ii2 ^ 0. Let fi be any 
element of H. Since rjj is the representer of Fj and fii2 is orthogonal to rjj, 

Fj{^i) =< r/j,/i >=< +/iii +Ati2 >=< +/iii >= Fj{po + /iii). 

Therefore, /ii2 is irrelevant in computing the first term in (1.1). To study the 
second term in (1.1), by (C.4) and the orthogonality of /in and fii2, 

P{H) =< /il,/il > = < /ill, /ill > + < /il2,/il2 > • 

Therefore, we want to find /ig G Hq, /in G Hn and G H12 to minimize 

G{ti, ...,t„,Yi,.. . ,y„,Fi(/io+/iii), . . . ,F„{na+nii))+\ [< /in, /in > + < /ii2,/ii2 >] • 

Clearly, we should take /ii2 to be the zero function and so any minimizer of 
(1.1) is of the form 

Kt) = /io(i) + /iii(i) 

m n 

= (^) + 51 '^j' 1 • 
1 1 

Now consider rewriting P(/i) as f3'Kf3: P{ii) =< /in, /in >= J2jkl^jf^k < 
Vji-sVki >= /3'K*f3 for K* symmetric and non-negative definite. To show that 
-^*bi^] = FjiVki), use the fact that rjji is the representer of Fj in Hi, that 
is, that < rjji, / >~ Fj{f) for all f eHi- Applying this to / = rjki yields the 
desired result, that < ?7ji,»7fci >= Fj{r]ki)- 

Consider the case that Fj{f) = f{tj) and Fk{f) = f{tk)- Then 'qij{t) = 
Fj{Ri{-,t)) = Ri{tj,t), 77ifc(t) = Ri{tk,t), and K\j,k] = Fj^i^ui) = Ri{tk,t,) = 
Ri{tj,tk) by symmetry of □ 

The proof of the following Corollary is immediate, by taking m = in (C.2). 

Corollary 3.1. Suppose that H is an RKHS with inner product < •, • > and 
reproducing kernel R. In (1-1), suppose that P{fJ.) =< fJ., fJ- > and assume that 
the Fj 's are continuous linear Junctionals . Then the minimizer of (1.1) is of the 
form 

n 

/i(t)-^/3,^;(i?i(.,i)). 

1 
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4. A Bayesian connection 

Sometimes, the minimizer of (1.1) is related to a Bayes estimate of /i. In the 
Bayes formulation, Yj = ij,{tj)+ej where the e^'s are independent normal random 
variables with zero means and variances equal to a^. The function /i is the 
realization of a stochastic process and is independent of the e^-'s. 

The connection between fi, the minimizer of '^[Yj — /i(tj)]^ ^ J ^-nd 
a Bayes estimate of fi was first given by Kimeldorf and Wahba [11] for the case 
that Lfj. = The result was generalized to L's as in (1.2) by Kohn and 

Ansley [12]. The function ^ is defined on 5R and is generated by the stochastic 
differential equation L/i(i) dt = a\/\ dW{t) where is a mean zero Wiener 
process on [a, 6] with Mai{W{t)) = t. Assume that fjL satisfies the initial condi- 
tions: /^(a), fJ-'{a), . . . , /i'-'"~^^(a) are independent normal random variables with 
zero means and variances equal to k. Let jjkit) be the posterior mean of fi{t) 
given Yi, . . . ,F„. Then Kimeldorf and Wahba [11] and Kohn and Ansley [12] 
show that jl(t) = limfc_i.oo fikit)- 

Another Bayes connection arises in Gaussian process regression, a tool of 
machine learning (see, for instance, Rasmussen and Williams [18]). Consider /x 
defined on A C W, with fi the realization of a mean zero stochastic process 
with covariance function S. Let /ts be the pointwise Bayes estimate of /i: 

As it) = E{^i{t)\Yl , . . . , r„) = ^(i, t) [a^l + S{t, t)] Y 

where S'(t,t)' is an n-vector with jth entry S{t,tj), S'(t,t) is the n x n matrix 
with jfcth entry S(tj,tk) and Y = (Yi, . . . , Yn)'- Then, as shown below, for an 
appropriately defined Reproducing Kernel Hilbert Space Hs with reproducing 
kernel S, the Bayes estimate of n is equal to 

n 

arg inin ^[Yj - ^i{tj)f + cr^ < ^, /i > . (4.1) 

The existence of the space Jis with reproducing kernel S is given by the Moore- 
Aronszajn Theorem (Aronszajn [4]). The space is defined by constructing finite- 
dimensional spaces: fix J > and ti, . . . ,tj £ A and consider the finite dimen- 
sional linear space of functions, 'H{ti,...,tj}^ consisting of all linear combinations 
of 5(^1, •), S{t2,-)7 • • • 7 S{tj, •). Let H* be the union of these ^{^....,1,,}'^ over all 
J and all values of ti,. . . ,tj. Let <, > be the inner product on H* generated 
by < S{tj,-),S{tk,-) >= Sitj,tk), that is, < Ej ajSitj,-),EkbkSixk,-) >= 
J2j k ^jbkS{tj,Xk)- Let "Hs be the completion of 1-L* under the norm associated 
with this inner product. Then %$ is a Reproducing Kernel Hilbert Space with 
reproducing kernel S. So, by Theorem 3.1, the solution to (4.1) is of the form 
IJL{t) = XilLi l3iS{ti,t) = S{t,t)f3, with the /3j's chosen to minimize 

n r n 

j=i I 1=1 

where f3 ~ (/3i, . . . , /?„)'. The minimizing /3 is [cr^I + 5'(t, t)] ^ Y, and so the 
solution to (4.1) is equal to fls- 
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5. Results for the cubic smoothing spline 

Here, we minimize (1.3) using Theorem 3.1. The expressions for the reproducing 
kernels Ro and Ri are provided. The next section contains an algorithm for 
computing i?o and i?i for general L. 

The first step to minimize (1.3) over /i G H^[a, b] is to define the inner product 
on ■H^[a, b]: 

< f,g fia)g{a) + f'{a)g'{a) + f f"{t) g" [t) dt. 

Verifying that this is an inner product is straightforward, including showing that 
</, />= if and only if / = 0. The proof that 7i'^[a,b] is complete under this 
inner product uses the completeness of /3^[a, b]. 

For (C.l) and (C.2) of Section 3, we partition H^la^b] into Hq and Hi: 

"Ho = {/ : f"{t) = 0} = the span of {l,t} 

and 

ni^{fen^[a,b]:f{a)=f'{a)=0}. 

Hi is the orthogonal complement of Hq and so H'^[a,b] = Hq © Hi- (This is 
shown in Theorem 6.1 for H™'[a, b].) 
For (C.3) let 

i?o(s,i) = 1 + {s- a){t- a) 

and 

s + 1 1 
Ri (s, t) = st (min{s, t} - a) + —— [(min{s, t})'^ - a'^] + - [(min{.s, t})^ ~ a^] . 

z o 

Then direct calculations verify that Rq and Ri are the reproducing kernels of, 
respectively, Hq and Hi, that is, that Ri G Hi and that < Ri{-,t),f >= f{t) 
forall/e-H,, i = 0,l. 

To verify that condition (C.4) is satisfied, write /i = /ig + /^i, with fii €z Hi, 
z = 0, 1. Then Pi^l) = Ji^^")^ = J{ij'l)^ =< ^i, >. 

We can show that Fj(fj,) = jiitj) is a continuous linear functional, either by 
using the definition of the inner product to verify continuity of Fj or by noting 
that R = Rq + Ri is the reproducing kernel of H^[a, b]. Thus, by Theorem 3.1, 
to minimize (1.3) we can restrict attention to 

n 

fi{t) = ao + oiit + ^ /3jRi{tj,t) 
1 

and find ap, ai and f3= {fii, . . . , /?„)' to minimize 

■J fc 
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where K[j^ k] — Ri{tj, tk). In matrix/ vector form, we seek /3 and ex. — (ag, ai)' 
to minimize 



with Y = (Ki, • • • , Yn)' , Til = 1 and Ti2 = t;, i = 1, • • • , n. One can minimize 
(5.1) directly, using matrix calculus. 

Unfortunately, solving the matrix equations resulting from the differentiation 
ol (5.1) involves inverting matrices which are ill-conditioned and large. Thus, 
the calculations are subject to round-off errors that seriously effect the accuracy 
of the solution. In addition, the matrices to be inverted are not sparse, so that 
0{n^) operations are required. This can be a formidable task for, say, n = 
1000. The problem is due to the fact that the bases functions 1, i, and Ri{tj, •) 
are almost dependent with supports equal to the entire interval [a, &]. There 
are two ways around this problem. One way is to replace this inconvenient 
basis with a more stable one, one in which the elements have close to non- 
overlapping support. The most popular stable basis for this problem is that made 
up of cubic B-splines (see, e.g., Eubank [7]). The ith B-spline basis function has 
support [ti] ti^2] and thus the matrices involved in the minimization of (1.3) are 
banded, well-conditioned, and fast to invert. Another approach is that of Reinsch 
([19], [20]). The Reinsch algorithm yields a minimizer in 0{n) calculations. The 
approach for the Reinsch algorithm is based on a paper of Anselone and Laurent 
[2]. Section 6.4 gives this technique for minimization of expressions like (5.1). 

6. Results for penalties v^rith differential operators 

Now consider the problem of minimizing (1.1) with penalty P based on a dif- 
ferential operator L, as in (1.2), that is, of minimizing 



over ^ G H'"[a, 6]. We can apply Theorem 3.1 using the Reproducing Kernel 
Hilbert Space structure for H"^[a,b] defined in Section 6.1 below. We can then 
explicitly calculate the form of ^ provided we can calculate reproducing kernels. 
Theorem 6.1 states a method for explicitly calculating reproducing kernels. Sec- 
tion 6.2 summarizes the algorithm for calculating reproducing kernels and the 
form of the minimizing /i, and contains three examples of calculations. Theorem 
6.1 and the calculations of Section 6.2 require results from the theory of differ- 
ential equations. The Appendix contains these results, including a constructive 
proof of the existence of G{-, •), the Green's function associated with the dif- 
ferential operator L. Section 6.4 contains a fast algorithm for minimizing (6.1) 
when G is a sum of squares and Fj{f) = .f{tj). 

6. 1 . The form of the minimizer of (6.1) 

Giving the form of the minimizing /i uses the result of Theorem A.l in the 
Appendix, that there exist linearly independent ui, • • • ,Mm G 'H'"^[a,b] with m 
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derivatives and that these funetions form a basis for the set of all /i with L/^(i) = 
almost everywhere t. Furthermore W{t), the Wronskian matrix associated with 
Ml, • • • , Um, is invertible for all t £ [a,h]. The Wronskian matrix is defined as 

The following is an inner product under which ■H™[a, &] is a Reproducing 
Kernel Hilbert Space: 

< /,5 E + / (L/)W (L5)(0 dt. (6.2) 

To show that this is, indeed, an inner product is straightforward, except to 
show that < f, f >^ implies that / = 0. But this follows immediately from 
Theorem A. 4 in the Appendix. 

Theorem 6.1. Let L be as in (1.2), let {ui, • • • be a basis for the set of 

^ with Lfi = and let W{t) be the associated Wronskian matrix. Then, under 
the inner product (6.2), 'H™[a, &] is a Reproducing Kernel Hilbert Space with 
reproducing kernel R{s,t) = Rf){s,t) + Ri{s,t) where 

ra 

Ro{s,t)= E GijUi{s)uj{t) 

with 

C,, = [{W{a)W'{a))'^]^^, 
Ri{s,t) = I G{s,u) G{t,u) du 

J u—a 

andG{-, •) is the Green's function associated with L, as given in equations (A.l), 
(A. 2) and (A. 3) in the Appendix. Furthermore, 'H"^ [a,b] can be partitioned into 
the direct sum of the two subspaces 

Ho ~ the set of all f G 'H™[a, 6] with L/(t) = almost everywhere t 

= the span of ui, . . . , Um 

and 

Ui = the set of all f £ ^'"[0, h] with f^^\a) 0, j = 0, • • • m - 1. 

T-li is the orthogonal complement ofH^- ^0 tias reproducing kernel Rq and "Hi 
has reproducing kernel Ri . 

Proof. To prove the Theorem, it suffices to show the following. 

(a) Any / in H™[a, 6] can be written as / = /o + /i, with fi e Hi and 
< /o, /i >= 0. 
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(b) i?o is the reproducing kernel for Hq and i?i is tlie reproducing kernel for 

Consider (a). Obviously, for fi G Hi, z = 0, 1, < /o,/i > is equal to zero, 
by the definition of the inner product in (6.2). To complete the proof of (a), fix 
/ e 'H™[a, b] and find ci, • • • , Cm such that, if /o = X] CjWj, then /i = / — /o S 
"Hi. That is, we find ci, . . . , Cm such that, for j = 0, • • • , m — 1, /;["''' (a) = 0, 
that is /^■'^ (a) — Ciuf^ (a) = 0. Writing this in matrix notation and using the 
Wronskian matrix yields 

(/(a),/'(a),..- ,/("-i)(a)) = (ci,--. ,c™)Ty(a) 

and we can solve this for (ci, • • • , c,„), since the Wronskian W{a) is invertible. 

Consider (b). To prove that Ri is the reproducing kernel for "Hi, first simplify 
notation, fixing t E [a, b] and letting r{s) = i?i(s, t). We must show that r G 
and that that < r, f >= f{t) for all / £ "Hi. Again, to simplify notation, let 
h{u) = G{t, u). By definition of r{s) ~ J^^ G(s, u) h[u) du. By Theorems A. 5 
and A. 6, r £ "Hi and Lr(s) = h{s) = G{t, s) almost everywhere s. Therefore, for 

<r,/>=0+ / (Lr)(s) (L/)(5) - / G(i, s) (L/)(s) ds = /(t) 



by the definition of the Green's function. See equation (A.l). 

Now consider Rq. Obviously, Ro{-,t) G Jio, since it is a linear combination of 
the Ui's. To show that < Ro{-,t),f >= f{t), it suffices to consider f = ui,l = 
1, • • • , m. Noting that Lti/ = 0, write 

m 



m fm — 1 



J2<\a)u\'\a) +0 



771—1 



i.j^l k^Q 
rn 

J2 u,{t)[W{a)W'{a)]u 

711 

Y,umW{a)W'{a)C]i, 

i=i 

ui{t). 



□ 



We can now use Theorems 3.1 and 6.1 to write the form of the minimizer of 
(6.1). The proof of the following Theorem is straightforward. 
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Theorem 6.2. Suppose that L is as in (1-2). Let ui, ■ ■ ■ ,Um be a basis for the 
set of fi 's with L/i = and let G be the corresponding Green's function, defined 
in equations (A.l), (A. 2) and (A. 3) in the Appendix. Let 



and r]ji{t) = Fj{Ri{-,t)). Then the minimizer of (6.1) must be of the form 



with K as defined in Theorem 3.1. 

6.2. Algorithm and examples for calculating Rq, i?i and the 
minimizing fi 

Suppose that we're given a linear differential operator L as in (1.2). The following 
steps summarize results so far, describing how to calculate Rq and Ri, the 
required reproducing kernels associated with L, and the /i that minimizes (6.1). 

1. Find ui, • • • , Um, a basis for the set of functions ^ with L/i = 0. 

2. Calculate W{-), the Wronskian of the w,'s: Wij{t) = u[^~'^\t). 

3. Set i?o(s,t) = J:^.J[[Wia)W'ia)]-%u,{s)uJit). 

4. Calculate {ul{t), ■ ■ ■ ,w^(i)), the last row of the inverse of W{t). 

5. Find G, the associated Green's function: G{t,u) = '^Ui{t)u*{u) for u <t, 
else. 

6. Set Ri{s,t) = JI^G{s,u) G{t,u) du. 

7. Find r]if.Tji,{t) = Fj{Ri{-,t)). 

8. Calculate the symmetric matrix L\: K[j,k] = Fk{rjij). If Fj{fi) = i-i{tj) 
and Ffe(/i) = fi{tk) then K[j,k] = Ri{tj,tk). 

9. Set /i(t) ~ J2 (^j^j (t) + J2j l^jVij (t) and minimize G{ti , . . . , Yi , . . . , F„, 
i^i(^), . . . ,F„(^)) + Xf3'Kf3 with respect to f3 and the a^'s. 

The first step is the most challenging, and for some L's, it may in fact be 
impossible to find the Wj's in closed form. However, if L is a linear differential 
operator with constant coefficients, then the first step is easy, using Theorem 
A. 2. Alternatively, if one has an approximate model in mind defined in terms of 
known functions mi, . . . , Um, then one can find the corresponding L (see Example 
3 below). 

The reader can use these steps to derive the expressions in Section 5 for the 
cubic smoothing spline. 
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Although the calculation of the minimizing ^ does not involve i?o, step 3 
is included for completeness, to allow the reader to calculate the reproducing 
kernel, Rq + Ri, for 'H™[a, b] under the inner product (6.2). 

Example 1. Suppose that L/i = /i' and that the interval [a, b] is equal to [0, 1]. 
In Step 1, the basis for L/x = is = 1. In Step 2, the Wronskian is the one 
by one matrix with element equal to 1. So in Step 3, Ro{s,t) = 1. In Step 4, 
u*(s) = 1 and so, in Step 5, G{t,u) = I ii u < t, else. Therefore 

/•minjs.i} 

Ri{s,t) = 1 du ^ mm{s,t}. 

Jo 

Thus, we seek fi of the form 

n 

i=i 

If Fj (/i) — n{tj), j = 1, . . . ,n, then we seek 

n 

= a + l3j min{ij, <}, 

that is, the minimizing ^ is piecewise linear with pieces defined in terms of 
ti, . . . ,tn. In Step 8, K[j,k] = mm{tj,tk}. 

If, instead, Fj(^) — fjjjL for known /j, as in Section 2.2, then 

F,{R,{-,t)) = mj{t)= [ f,{s) Ri{s,t) ds = [ f,{s) mm{s,t}ds 

Jo Jo 

= ^ s fj{s) ds + tj^ fj{s) ds 

and, in Step 8, 

K[jM = / fk{t) r]ij{t) dt^ f fk{t) fj{s) min{s,0 ds dt. 



Example 2. Suppose that L/ = /" + 7/', 7 a real number. 

For Step 1, we can find ui and U2 via Theorem A. 2 in the Appendix. We first 
solve + 7x = for the two roots, ri = and r2 = —7. So 

ui{t) = 1 and U2{t) — exp(— 7t). 

For Step 2, we compute the Wronskian 

cxp(— 71) — 7exp(— 71) 
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For Step 3 we have 

[W{a)W'{a)]~^ 

So 



exp(7a) exp(27a) 



Ro{s,t) = CiiUi{s)ui{t) + Ci2ltl(s)u2(^) + C2lU2{s)ui{t) + C22U2{s)u2{t) 

= 1 + ^ - A exp(-7t*) - exp(-7s*) + \ cxp(-7(s* + t*)). 

with s* = s — a and t* = t — a. 

For Step 4, inverting W{t) wc find that 

~ ~ ^iid U2{t) = cxp(7t) 

and so, in Step 5, the Green's function is given by 

G{t,u) = 



i (f - cxp(-7(i - u))) ioru<t 



1 else. 

To find Ri{s,t) in Step 6, first suppose that s <t. Then 

Riis,t) = r7-2(f-e-^("-")) (f-e--^^*-")) 



= — "3 + -2 + A exp(-7s*) + ^ exp(-7r) 

■-y-^ "-yJ '-y'-* 

exp[-7(r - s*)] - ^ exph7(s* + t*)]. (6.3) 

Since i?i(s,t) — s), if t < s, then Ri{s,t) is gotten by interchanging s* 

and t* in the above. 

Therefore, to minimize (6.1) over ji G H'*[a, 6], we seek pi of the form 

n 

Ai(i) = ai +a2 cxp(-7t) + ^/3jFj(i?i(-,i). 

1 

The calculations in Steps 7 and 8 for rjji{t) = Fj{Ri{-,t)) and are tedious 
except in the case that Fj{f) = ,f{tj). 

Example 3. Instead of specifying the operator L, one might more easily specify 
basis functions ui, • • • , Um for a preferred approximate parametric model. For 
instance, one might think that ^ is approximately a constant plus a damped 
sinusoid: ^{t) « ai + a2 sin(t) exp(— t). Given ui, ■ ■ ■ ,m„i, one can easily find 
the operator L so that Lm^ = 0, i = 1, • • • , m, and thus one can define an 
estimate of /i as the minimizer of (6.1). Assume that each Ui has m continuous 
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derivatives and that the associated Wronskian matrix W{t) is invertible for all 
t g [a, h]. To find L, we solve for the w^'s in (1.2): 



m— 1 



o = (L«,)(^) = ^,('")(^) + ^c.,.(^)^«(^), 

3=0 



that is 



m — 1 

)(t) = -^..,(t)up)(i). 
This can be written in matrix/ vector form as 



W{t) 



UJo{t) 






w,„_i(i)_ 







yielding 



cjo(i) 
w,„_i(t) 



Obviously, the oj^'s are continuous, by our assumptions concerning the u^'s and 
the invertibility of W{t). 

For the example with ui = 1 and U2 = sin(i) exp(— i), we find that 



W{t) 



1 
sin(i) exp(— t) exp(— i)[cos(i) — sin(i)] 



which is invertible on [a, 6] provided cos(<:) ^ sin(i) for t G [a, 6]. In this case, 
wo(<) = 0, uJi{t) = 2 cos(<)/[cos(t) — sin(t)] and so the associated differential 
operator is L(yLt)(t) = ^Ji"{t) + 2fj,'{t) cos(t)/[cos(i) — sin(<)]. Note that we do not 
need L to proceed with the minimization of (6.1) - we only need ui, • • • , w„i to 
calculate the required reproducing kernels. However, if we would like to cast the 
problem in the Bayesian model of Section 4, we require L. 



6. 3. Minimization of the penalized weighted sum of squares via 
matrix calculus 

Consider minimizing a specific form of (6.1) over fi € 'H'"[a, 6], namely minimiz- 
ing 

J2dAY,-F,{^,)]' + X f{Lur (6.4) 

for known and positive dj's. We can rewrite this as a minimization problem easily 
solved by matrix/vector calculations, provided we can find a basis {ui, . . . , Mm} 
for the set of /i with L/i = 0. 
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Theorem 6.2 implies that, to minimize (6.4), we must find ol — (ai, . . . , am)' 
and /3 = ...,/?„)' to minimize 

{Y -Ta- KI3)'D{Y -Ta- K(3) + Xfi'KjS 

where Y — (Yi, • • • , Yn)' , T is n x m with T[i, j] = Uj{ti), K is n x n with 
K[j,k] ~ Fj{riki), and D is an n hy n diagonal matrix with -D[i,i] = di. As- 
sume, as is typically the case, that T is of full rank and K is invertible. Taking 
derivatives with respect to a and (3 and setting equal to zero yields 

T'D{Y - K$) = T'DTa. (6.5) 

and 

-2K'D{Y -Ta- K$) + 2XK$ = 
which is equivalent to 

Y - Ta - (is: + XD-^)$ = 0. 

Let 

M + XD-\ 

Then 

$ = M-\Y - Ta). (6.6) 
Substituting this into (6.5) yields 

T'D[I - KAr^]Y = T'D[I - KAr^]Ta, 

that is 

T'D[M - K]M-^Y = T'D[M - K]M-^Ta 

or XT'M-^Y = XT'M-^Ta. 

Therefore, provided T is of full rank, 

a. = {T'Ar^T)-^T'Ar^Y (6.7) 

and 

/3 = Ar^\l- T{T' Ar^T)-^T' Ar^]Y . (6.8) 

Unfortunately, using equations (6.7) and (6.8) results in computational prob- 
lems since typically M is an ill-conditioned matrix and thus difficult to invert. 
Furthermore, M is n x n and n is typically large, making inversion expensive. 
Fortunately, when Fj{f) = f{tj) we can transform the problem to alleviate the 
difficulties and to speed computation. The details are given in the next section. 
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6.4- Algorithm for minimizing the penalized weighted sum of 
squares when Fj{f) — f{tj) 

Assume that Fj{f) = f{tj), that a < ti < • • • < t„ < 6, that T is of full rank 
n — m and that K is invertible. The goal is to re-write a in (6.7) and /3 in 
(6.8) so that we only need to invert small or banded matrices. Meeting this goal 
involves defining a "good" matrix Q and showing that 

/3 = Q{Q'MQ)~^Q'Y (6.9) 

and 

a = {T'T)-^T'{Y - MP). (6.10) 

We will define Q so that Q'MQ is banded and thus easy to invert. To begin, 
let Q be an n by n — m matrix of full column rank such that Q'T is an n — m by 
m matrix of zeroes. Q isn't unique, but later, further restrictions will be placed 
on Q so that Q'MQ is banded. 

We first show that T'/3 = 0. This will imply that there exists an n — m vector 
7 such that = Q-f. From (6.6) 

Y ^ M0 + Ta (6.11) 

Substituting this into (6.7) yields 

a = {T'M~^T)-^T'$ + a. 

Therefore 

{T'M~'^T)-^T'$ = 
and so T'/3 = and = Q7 for some 7. To find 7, use (6.6): 

Q'M$ = 0'(Y - Ta) = Q'Y 

since QT = 0. So Q'MQj = Q'Y, yielding 

7 = {Q'MQ)-^Q'Y. 

Therefore equation (6.9) holds. Equation (6.10) follows immediately from equa- 
tion (6.11). 

We can also find an easy-to-compute form for Y = Ta + K0 using (6.11); 
Y = {K + XD-^)$ + Ta = Y + \D-^$ 

and so 

Y = Y - XD-^$. 

Note that we have not yet used the fact that Fj{f) = f{tj). In the special 
case that Fj{f) = f{tj), we can choose Q so that Q'MQ is banded. Specifically, 
in addition to requiring that Q'T ~ 0, we also seek Q with 

Qij = unless i = j + 1, • • • ,j + m. (6.12) 
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So we want Q with [Q'T]ij = I]"=o Qi+isUji^i+i) = for all j = 1, • • • , m, i = 
1, • • • , n—m. That is, for each i, we seek an (TO+l)-vector = {Qa, • • • , Qi+m,!)' 
satisfying q^Ti = 0, where Ti is the (m + 1) by m matrix with /jth entry equal 
to Uj{ti^i). This is easily done by a QR decomposition of Tf. the matrix Ti can 
be written as Ti ^ Qi Ri for some Qi, an (m + 1) x (m + 1) orthonormal matrix, 
and some Ri, {m + 1) x m with last row equal to 0. Take q,; to be the last 
column of Qi. 

We now show that Q'MQ is banded, specifically, that [Q'MQ]ki = whenever 
\k-l\>m. Write Q'MQ = Q'KQ + XQ'D^^Q. Since D is diagonal, one easily 
shows that [QD^^Q]u = for |fc — Z| > m. To show that the same is true for 
Q'KQ, write 

K[i,j] = Ri{U,tj) 

G{ti,ui) G{tj,uj) du! 



/min{ii ,tj} 
u*{uj) ul{uj) du 

= ^^Ur{ti)Us{tj) Tr,s{min{ti,tj}). 

= ^ T„Tjs Tr,s (nun{ti ,tj)). 



Since Q'KQ is symmetric, it suffices to show that [Q' KQ]ki = for fc — Z > m. 
So fix k and / with k — I > m and write 



[Q'KQ]ki = ^ QikKijQji = ^ Qk+i,kKk+i,i+jQ 



1+0,1 



m m 



Qk+Lk J'r,s{ nun{ifc+i, Tk+i^rTl+j^sQl+j,l 

i,j—Q r,s—l 



ni ni 



— J'r,s{tl+j) Tij^j^sQl+j,l Qk+i,kTk+i,r- 

j—0 r,s—l 2—0 

The last equality follows since k > I + m and < i, j < m imply that fc + 
i > I + j and so < tk+i- We immediately have that [Q'KQ]ki ~ 0, since 

12i=0 Qk+i,kTk+i,r = [Q'T]kr = 0. 



Thus minimizing (6.4) when Fj{f) = f{tj) is easily and quickly done through 
the following steps. 

1. Follow steps 1 through 8 of Section 6.2 to find ui,--- a basis for 
L// = 0, the reproducing kernel i?i and the matrix K: K[i, j] = Ri(ti,tj). 

2. Calculate the matrix T: T[i,j] = Uj{ti). 
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3. Find Q n hy [n — m) of full column rank satisfying equation (6.12) and 
Q'T = 0. One can find Q directly or by the method outlined below equa- 
tion (6.12). 

4. Find (3 and a using equations (6.9) and (6.10). Speed the matrix inversion 
by using the fact that Q'MQ is banded. 

Example 2 continued from Section 6.2. Suppose that we want to minimize 
j^d^Yj - n^t,)f + A ^\^i"{t)+^^i'{t)f dt 

over jjL S TH?\Q, 1]. For simplicity, assume that ti = i/(n + 1). Using the calcula- 
tions from Section 6.2, we set Tn = l,Tj2 = exp(— 7^^), and /^[i,^] = Ri{ti,tj), 
with Ri as in (6.3). 

For Step 3, we find Q directly: we seek Q n by (n — 2) with Qij = unless 

i = + 1, J + 2 and 

= [Q'T]ij = QiiTij -f QiA+iTi+ij + Qi,i+2Ti+2,j- 
Thus, for j = 1, 

= Qii + Qi.i+i + Qi,i+2 

and, for j = 2, 

= exp{-jti) + exp(-7tj+i) -I- Qi.1+2 exp(-7ii+2)- 

We take 

= 1 - cxp ( ^ ) Qi,t+i = - cxp ( ^ ) + cxp 



Qi,i+2 = cxp I 1 - 1 



n + 1 J ' \n + 1 J \ n + 1 

and 

Continuing with the fourth step to find a and /3 is straightforward. 
Appendix A 

The Appendix contains background on the solution of linear differential equa- 
tions LyU — with L as in (1.2). Section A. 2 contains results about G, the 
Green's function associated with L. 

A.l. Differential Equations 

Details of results in this section can be found in Coddington [6]. The main 
Theorem, stated without proof, follows. 
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Theorem A.l. Let L be as in (1.2). Then there exists ui, • • • , u,„ a basis for the 
the set of all fi with Lfi = 0, with each Ui real-valued and having m derivatives. 
Furthermore, any such basis will have an invertible Wronskian matrix W{t) for 
all t G [a, b] . The Wronskian matrix is defined as 

The following Theorem, stated without proof, is useful for calculating the 
basis functions in the case that the Wj's are constants. 

Theorem A. 2. Suppose that L is as in (1-2), with the ujj 's real numbers. De- 
note the s distinct roots of the polynomial + X]j=o^'^j^"' ''ii ' ' ' I'^s- Let 
rUi denote the multiplicity of root ri (so m ~ ^\rni). Then the following m 
functions of t form a basis for the set of all fi with Lfi = 0: 

exp(rii), t exp(rit), • • • ,t"^'~^ exp{rit) i = 1, • • • ,s. 

The following result, stated without proof, is useful for checking that a set of 
functions does form a basis for the set of all ^ with L/i = 0. 

Theorem A. 3. Suppose that wi, • • • , Um have m derivatives on [a, b] and that 
Lui = 0. IfW{tQ) is invertible at some to € [a,b], then the ut's are linearly 
independent, and thus a basis for the set of all fi with Lfi = 0. 

The following result was useful in defining the inner product in equation (6.2), 
where to was taken to be a. 

Theorem A. 4. Suppose that L is as in (1.2) and let to G [a, 6]. Then the only 
function in ■H™[a, fe] that satisfies Lf = the zero function and f^^\to) = 0,j = 
0, • • • , m — 1, is the zero function. 

Proof. By Theorem A.l, there exists ui, • • • , Um a basis for the set of all /i with 
Ijfi = 0, with W{t) invertible for all t G [a, b]. Suppose Lf = 0. Then / = J2i '^iUi 
for some q's. We see that the conditions f'^^\to) = 0, J = 0, • • • , m — 1 can be 
written in matrix/vector form as (ci, • • • , Cm)W{to) = (0, • • • ,0). Since W{to) 
is invertible, q = 0, z = 1, • • • , m. □ 



A. 2. The Green's Function Associated with the Differential 
Operator L 

Suppose that L is as in (1.2). The definition below gives the definition of G(-, •), 
the Green's function associated with L with specified boundary conditions. The- 
orem A. 5 gives an explicit form of G. 

Definition. G is a Green's function for L if and only if 

f{t)= I G{t,u) iLf){u)du (A.l) 
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Of course, it's not immediately clear that such a function G exists. However, 
G exists and is easily calculated using the Wronskian matrix associated with L 
(see Theorem A. 5). Recall from Theorem A.l of Section A.l that there exists a 
basis for the set of all ^ with L/x = 0, ui, • • • , Um, with invertible Wronskian. 
Furthermore, each Ui has m derivatives. 

Lemma A.l. Let u\{t),--- ,u*-^{t) denote the entries in the last row of the 
inverse of W(t). Then u* is continuous, j = 1, . . . ,m. 

Proof. The u*'s are continuous, since u* = (detM^(t))^^ times an expression 
involving sums and products of I = 1, - ■ ■ , m, j = 0, • • • , m— 1, and the m;'s 
have 771—1 continous derivatives. □ 

Theorem A. 5. Let Ui{t), ■ ■ ■ ,u*„(t) denote the entries in the last row of the 
inverse ofW(t). Then 



is a Green's function for L and, for each fixed t £ [a, b], G{t, •) is in L'^[a, b]. 

The following theorem will be useful in the proof of Theorem A. 5. 
Theorem A. 6. Let G be as in (A.3) and suppose that h E £2- If 




for u < t 
otherwise 



(A.3) 




Then 



r e n'"[a,b], 
(L7')(<) = h(t) almost everywhere t G [a,b] 



(A.4) 
(A.5) 



and 



r' 



'«(a) = j = 0,---, 



777,- 1. 



(A.6) 



Proof. Write 




i=l 



We'll first show that 



rn „t 

r^'H^) =^u^P{'t) / h{u) du 



j = 0, • • • ,771 



1 



(A.7) 
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and 



{t) = h{t) + u:™'\t) / u*{u) h(u) du almost everywhere t £ [a, b]. 
1=1 

(A.8) 

These equations follow easily by induction on j. We only present the case j = 1. 
Then 



Since u* and /i are in £2, 



h(u) du 



almost everywhere t. But, by definition of W and the m,*'s, this is equal to 

hit)Y[Wit)U[Wity']rm = hit) [Wit)-^Wit)]ml = /l^ I{™ = 1}. 

Therefore, for m — 1, (A.8) holds and for to > 1 (A. 7) holds when j ~ 1. For 
m > 1 and j > 1, we can calculate derivatives of r up to order to — 1, and 
can calculate the TOth derivative almost everywhere to prove (A. 7) and (A.8). 
Clearly, the TOth derivative in (A.8) is square-integrable. Therefore we've proven 
(A.4). 

To prove (A. 5), use (A. 7) and (A.8) and write 

m — 1 

(Lr)(t)=r(")(i)+^c.,(t)rW(t) 



h{t)+Y,u^\t) / u*{u) h{u) du+Y, ^c.,(t)^«(t) / u*{u) h(u) du 



3=0 i=l 



h{t) + Y 



,(™) 



3=0 1=1 



u*{u) hiu) du 



i=l 

since Lm^ = 0. 

Equation (A. 6) follows directly from (A. 7) by taking t ^ a. 



□ 
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Proof of Theorem A. 5. First consider the function in equation (A. 3) as a func- 
tion of u with t fixed. Since the w^'s are continuous and W{u) is invertible for ah 
u, G(t, •) is continuous on the finite closed interval [a, b]. Thus it is in L'^[a, b]. 

To show that equation (A.l) holds, let / G satisfy the boundary condi- 
tions (A.2). Define r{t) = JI^G{t,u) (L/)(m) du. Then, by Theorem A.6, Lr = 
L/ almost everywhere and r^-'^ (a) = 0,j = 0, • • • , m — 1. Thus L(r — /) = 
almost everywhere and (r — /)^-'''(a) = 0,j = 0, • • • ,to — 1. By Theorem A. 4, 
r — / is the zero function, that is r = / . □ 
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